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is highly homologous to those of the other four human 



t„ , nn whute to the development of the transcrip- acrocentric chromosomes. To date, about 75 HC21 genes 
to«S5^^S2ST«l (HC21), we have have been cloned and partially characterised IGernme 
us d^xon trapping from pools of HC21-specific cos- DataBase, http-y/gdbwww.gdb.org, and SWES-PROT, 
ZL vZ S 3S Lppi exons, we have identified httptfwww.expasy.ch]. Trisomy for l*™"*™?"^ 
Tnov 1 g ne (named TMPRSS2) that encodes a mul- 21 is the most common chromosomal abnormahty at 
?imeric proteL with a serine protease domain. The birth, leading to the phenotypes known as Down syn- 
mRNA is expressed strongly in small drome (Epstein 1989). In addition, the loa for ^several 
int stine and weakly in several other tissues. The full- monogenic disorders have been mapped to , HC21 Dense 
leneth cDNA encodes a predicted protein of 492 amino linkage ma ps and almost complete physical maps ol ZLq 
acids that contains the following domains: (i) A serine have already been obtained and are now extensively 
proteas domain (aa 255-492) of the SI family that use d for the characterization of HC21 genes and tiieef- 
nrobably cleaves at Arg or Lys residues, (ii) An SRCR forts to determine the nucleotide sequence of HC21. 1 he 
(scaveng r receptor cysteine-rich) domain (aa 149- c l on ing and characterization of HC21 genes are a neces- 
242) of group A (6 conserved Cys). This type of domain sary ste p for the understanding of Down syndrome and 
is involved in the binding to other cell surface or extra- the molecular etiology of monogenic disorders mapping 
cellular molecules, (iii) An LDLRA (LDL receptor class on this chromosome. 

A) domain (aa 113-148). This type of domain forms a i n our laboratory, systematic exon-trapping expen- 
binding site for calcium, (iv) A predicted transmem- ments have been performed to identify portions ol 
bran domain (aa 84-106). No typical signal peptide HC2 1 genes, clone and characterize the corresponding 
was recognized. The gene was mapped to 21q22.3 be- fulMength cDNAs and genes, and participate in the 
tween markers ERG and D21S56 in the same PI as MX1. international effort to create a transcription map ot 
The physiological role of TMPRSS2 and its involve- HC2 i (Cheng etal, 1994; Peterson etal, 1994 ;Tas s one 
ment in trisomy 21 phenotypes or monogenic disorders et aL 19 9 4 . Lucente et al., 1995; Chen et al, 1996). We 
that map to HC21 are unknown. © 1997 Academic Press report here the cloning of a novel serine protease gene 

(TMPRSS2), which is expressed mainly in the small 

intestine, but also in lower levels in several other tis- 
INTRODUCTION sues, and which maps to 21q22.3. The predicted poly- 

peptide of TMPRSS2 also contains a transmembrane 
Human chromosome 21 (HC21) is the smallest chromo- domain> a scavenger receptor cysteine-rich (SRCR) do- 
some, with a long arm (21q) of around 40 Mb, containing main and an LDL receptor class A (LDLRA) domain, 
approximately 600-1000 genes (reviewed in AntonaraMs, and it pro b a bly belongs to the type II integral mem- 
1993) and a short arm (21p) of around 10-15 Mb, which brane pr oteins. The TMPRSS2 gene is homologous to, 

but different from, the human enteropeptidase gene, 
Sequence data from this article have been deposited with the Gen- wn j c h maps to a different region of HC21 (21q21). 
Bank Data Library under Accession Nos. U75329 (cDNA) and METHODS 
X88229 X88228, X88321, X88043, and X88047 (trapped exonsX MATERIALS AND METHOD* 

'To whom correspondence should be addressed at D^sionde Gen- Trapping 
ftique Medicale Centre ^edical ^^22 l^Tit tT-^. Pools of chromosome 21-specific cosmids from the LL21NC02 li- 
1211 Geneve 4, Switzerland. Telephone. 41-22-702570/. *ax. n ^ supplied by P. de Jong) were used in exon-trapping 

7025706. E-mail: Stylianos.AntonaraWs@medecuie.unige.ch. brary (kindly supplied oy r. ae uo E' 
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experiments (Buckler et al, 1991; Church et al., 1994; Gibco BRL 
Manual 18449-017). EcoBl- and Ps*I-digested cosmids were sub- 
cloned into pSPL3 vector, and plasmid DNA was used to transfect 
Cos7 mammalian cells using lipofectACE (Gibco BRL). Total RNA 
was isolated from Cos7 cells 24 h after transfection, cDNA was syn- 
thesized, and PCR products were subcloned into pAMPIO vector by 
UDG (uracil DNA glycosylase) cloning. After elimination of cryp- 
tically spliced, pSPL3-derived clones by oligonucleotide screening, 
the inserts of individual pAMPIO clones were subjected to nucleotide 
sequencing on an ABI373A automated sequencer by dideoxy termi- 
nator fluorescence method using Taq polymerase. Nucleic acid and 
amino acid homologies of the resulting sequences were analyzed 
through BLASTN and BLASTX searches of the nonredundant data- 
base (Altschul et al, 1990). 

Cloning ofTMPRSS2 cDNA 

The 216-bp PCR product derived from trapped exon HMC26A01 
with oligonucleotide primers (26A01A, 5 ' -GCCTGCGGGGTCAAC- 
TTGAAC-3', and 26A01B, 5'-GGCGGCTGTCACGATCCACTC-3') 
was used as a probe to screen approximately 500,000 clones of a 
human heart \gtlO cDNA library (Clontech HL3026a). One positive 
clone (APG1) was isolated, and the 2.4-kb insert was subcloned into 
the pAMPIO vector and sequenced in both directions using standard 
oligonucleotide walking protocols for the ABI373 automated se- 
quencer. The nucleotide sequence was verified using RT-PCR prod- 
ucts from intestine poly(A) + mRNA. 

Chromosomal Mapping 

Two independent methods were used to assign TMPRSS2 to a 
human chromosome. First, PCR amplification of the trapped exon 
HMC26A01 with specific oligonucleotide primers (26MAP1, 5'-GAG- 
GCTTCTGCAGCTTCATC-3 and 26MAP2, 5 ' -C AATCC ATGGC A- 
TTGGACGG-3 ') was performed on the genomic DNA from a panel 
of somatic cell hybrids with defined segments of HC21. Second, the 
insert of the initial trapped exon HMC26A01 was used to probe high- 
density filters of cosmids from the HC21-specific LL21NC02 library. 
Finally, PCR amplification using either oligonucleotide primers 26 
MAPI and 26 MAP2 or 26A01A and 26A01B was used on DNAs from 
a panel of HC21-derived YACs. 

5'- and 3 '-RACE (Rapid Amplification ofcDNA 
Ends) 

To obtain the 5' end of the TMPRSS2 cDNA, 5'-RACE was per- 
formed on human small intestine cDNA. From 1 fig of poly(A) + RNA 
(Clontech 6547-1) cDNA was made with the Marathon cDNA Ampli- 
fication kit (K- 1802-1), and 5 '-RACE using nested PCR primers was 
carried out with the enzyme Taq Expand High Fidelity (Boehringer 
Mannheim) according to the manufacturer's protocol. The gene-spe- 
cific primers were 26A01B (see above) and AP26BB (5'-CCGCTG- 
TCATCCACTATTCC-3'). In two different experiments the same 
PCR product of 670 bp was generated and subjected to nucleotide 
sequencing. 3 '-RACE was carried out using gene specific primers 
AP26G (5'-GGTTCTGGCTGTGCCAAAGC-3 ') and AP26K (5'-GTC- 
TGGCTTTGGCACTCTCTGC-3'), and a PCR product of approxi- 
mately 2.0 kb was generated; 

Northern Blot Analysis 

The cDNA clone APG1 containing the complete coding sequence 
was used to probe two Northern blots, each containing poly(A) + RNA 
from eight human adult tissues (Clontech 7759-1, Clontech 7760-1), 
and one containing four fetal tissues (Clontech 7756-1). Northern 
Blot analysis was performed using standard protocols, with high- 
stringency washing. A control hybridization using a human actin 
probe was used for determination of the amount of RNA loaded in 
these Northern blots. 

Comparative Protein Modeling 

The sequences of both LDLRA and protease domains of TMPRSS2 
were submitted to the SWISS-MODEL automated comparative pro- 



tein modeling server (Peitsch, 1995, 1996). The models werp 
as follows: ma ^ e 

LDLRA domain. SWISS-MODEL could not automatically 
vide a 3D structure of this domain since the degree of identity v 
the most similar sequence of known 3D structure was less than 30gt? 
Using BLAST (Altschul et al, 1990), we identified the Brookha\ 
Protein Data Bank entry 1LDL (NMR structure of the LDLRl i 
main) (Daly et a/., 1995) as the suitable modeling template. We th 
aligned the TMPRSS2 LDLRA domain with the sequence of ILL 
and submitted the sequence alignment to SWISS-MODEL using th<; 
Optimise mode. 

Serine protease domain. This domain was modeled using the 
First Approach mode of SWISS-MODEL, which provides fully auto- 
mated template identification and multiple sequence alignment prior 
to model building. Chymotrypsin (P17538) was identified as a suit-' 
able modeling template. The template and TMPRSS2 protease se- 
quences were automatically aligned and the model generation pro- 
ceeded to the end without human intervention. Sequence to structure s 
fitness analysis using both 3D- ID profiles (Liithy et al, 1992) and' 
Prosall (Sippl, 1993) did not show any obvious discrepancies. The % 
coordinates of both the LDLRA and the serine protease domain of ' 
TMPRSS2 can be found in the SWISS-MODEL Repository (http:// ' 
www.expasy.ch/swissmod/swmr-top.html). 

RESULTS 

Exon Trapping Identified a Clone with Homology to 
Human Proteases 

To clone partial gene sequences from human chromo- 
some 21 we have used pools of cosmids (from the 
LL21NC02-Q library) in an exon-trapping experiment 
and have identified more than 550 different potential 
exons (Chen et al., 1996). One trapped sequence 
HMC26A01 (GenBank X88229) of 216 bp showed a 
strong homology to a large list of serine proteases from 
human and other species. BLASTX analysis, for exam- 
ple, revealed a 55% amino acid identity to human 
prostasin (GenBank L41351; P = 1.3e-15). Other rep- 
resentative homologies included human elastase 
(P08218), Erinaceus europaeus plasminogen (U33171), 
and pig human coagulation factor IX (P16293). Because 
this HMC26A01 trapped sequence was probably de- 
rived from a undescribed human serine protease, we 
set out to clone and initially characterize the full-length 
cDNA of the corresponding human gene. 

Isolation of Full-Length TMPRSS2 Coding Sequences 

Clone HMC26A01 was used to screen approximately 
500,000 clones of a human heart XgtlO cDNA library 
(this library was chosen because of the expression pat- 
tern in Northern blots; see below). One positive clone 
(APG1), containing a 2.4-kb-long insert, was obtained, 
subcloned into the pAMPIO vector, and subjected to 
nucleotide sequence. 5 '-RACE from intestinal mRNA 
(again chosen because of the expression pattern) using 
oligonucleotides close to the 5' end of the APG1 clone 
extended the 5 'UTR sequence by about 150 nucleo- 
tides. Sequence analysis from both strands revealed 
an open reading frame of 492 amino acids starting 
from the most N-terminal methionine codon. The 
3'|JTR from the original clone APG1 was approxi- 
mately 0.95 kb. Figure 1 shows the complete nucleotide 
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. . .agggcacctctcttctgttttctctgcaaa /TCGGCAGCAA AATCGGTOTC/^atgagtcagccttaaccttgggaagggact. 

. . .aactcatggataatcctccctctcgtgcBfl /TTCGCCTCTA TGGGCTATW^agtatggggcagcacccgccgagtgac. 



R150 



.cgtgaccagaatttcccgttctttttgcaa /TGATGCCTGT TCXTTACGCT/ gfcataggtaagttcatctggagtccccctt . 

D229 c241 



. . .ctgagatactgagtccttcttctctcccaa / ACCTCTTAAC ACTTrCAACG/^acgtgtggctcaggcttggcaagcaggt . . 



P301 



.ggctcac^ 



itgtgtttcctcttcctgaaacta /ACCTAGTGAA GAGGAGAAAG/^gaggctgctcctgggcacacaggactgc. . . 



L360 



tgggagctcaacaagtctccctgtccttaa /GGAAGACCTC TTCTTGCCAG/ gfcaattcaacatttttattctacctttggtc. . . 

*' K392 0*38 

ctgctctctgtaccttgctgtgtcccacfia /GGTGACAGTG ATGAAGGCAA/ afcaactatcctgtcctccttctgactgtgtt . . . 

G439 N491 

. .cacttttttctttcctatttgaacaggcaa /ACGGCtajatccacatggtcttcgtccttgacgtcgp(3UTR) ... 

G492 * 

FIG. 2. Intron/exon junctions of the TMPRSS2 gene as determined by comparison of the cD NA ^ sequence to the publicly availabl| 
sequences of the human PI clone 35-H5-C8 (Martin et aL, 1994; Genbank Accession Nos. L35675-L35682). 



and predicted amino acid sequence of TMPRSS2. This 
cDNA was verified by RT-PCR amplifications from in- 
testinal RNA using pairs of oligonucleotide primers 
from the cDNA sequence. Interestingly, no ESTs iden- 
tical to portions of the TMPRSS2 cDNA sequence were 
identified in the dbEST database of GenBank (search 
of February 18, 1997). A number of additional exons 
from the Chen et al (1996) study were identical to 
portions of the TMPRSS2 cDNA, including HMC44E11 
(GenBank X88043), HMC26A05 (GenBank X88228), 
HMC19A07 (GenBank X88321), and HMC44D02 
(GenBank X88047). 

Intron/Exon Junctions 

Homology searches with sequences available in the 
public databases revealed identity of discontinuous re- 
gions of the TMPRSS2 cDNA with portions of human 
PI clone 35-H5-C8 which was sequenced by Martin and 
co-workers (Martin et al, 1994; GenBank Accession 
Nos. L35675-L35682). The comparison of the cDNA 
sequence of TMPRSS2 with the genomic sequence of 
human PI revealed intron/exon junctions that are 
shown in Fig. 2. Not all such junctions are reported in 
the figure since the sequence of the entire PI clone was 
not available in the public databases. It is likely that 
there are additional introns 5' to codon 110 and be- 
tween codons 191 and 229 and codons-241 and 301. 

Mapping ofTMPRSS2 to Chromosome 21 

PCR amplification was performed with oligonucleo- 
tide primers 26MAP1 and 26MAP2 on genomic DNA 
from rodent-human somatic cell hybrids that con- 
tained either single human chromosomes (NIGMS 2; 
Drwinga et al, 1993) or specific segments of HC21 (Pat- 
terson et al, 1993). The expected 155-bp PCR product 
was present in somatic cell hybrids WAV17, E7b, 725, 
2Furl, R50-3, GA9-3, 9528C-1, 1881C-13b, 8q-, ACEM 
2-10d, JC6A, and 1x4; in contrast, somatic cell hybrids 



21q+, 6918-8al, and MRC2-G did not show amplifica-^ 
tion (data not shown). These data localized this human' 
protease to the region 21q22.3 between markers ERG > 
and D21S56 (Fig. 3). 

We used exon HMC26A01 to probe a subset of the f 
cosmid library LL21NC02. One cosmid, Q20A3, was x. 
identified as positive. PCR on this cosmid with the 
same primers 26MAP1 and 26MAP2 produced the ex- 
pected 155-bp fragment, confirming that Q20A3 con- 
tained this exon of TMPRSS2 gene. Yeast DNA from 
79 YAC clones, chosen to cover almost all of HC21 (Chu- 
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FIG. 3. Schematic representation of the mapping position of the 
TMPRSS2 gene on chromosome 21 as resulted from PCR amplifica- 
tion of somatic cell hybrids and sequence identities with a chromo- 
some 21 PI clone (see Results). Representative results from PCR 
amplification using oligonucleotide primers 26MAP1/26MAP2 (see 
text) are also shown. 
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' . * » , , . . f , TM Ptt^2 rDNA as hybridization probe. The RNA filters are from Clontech (Cat. Nos. 7750- 

^^^^^SfS^-i^ *™ ™ e thick — shows the 38 " kb ^ species ' 

while the thin arrow depicts the faint 2.0-kb mRNA. 



makov et al., 1992), was used for PCR amplification 
with the two pairs of oligonucleotide primers 26MAP1- 
26MAP2 and AP26G (5'-GGTTCTGGCTGTGCCAA- 
AGC-30-AP26H (5'-CCAATGTGCAGGTGGAGACC- 
3') in the 3 'UTR region. No positive YACs were identi- 
fied. Many single YACs in 21q22.3 from the collection 
of Chumakov et al (1992) were also tested by PCR with 
these primers and no amplification was observed. The 
absence of positive YACs for this human TMPRSS2 
gene suggests either that the HC21 contig (Chumakov 
et al, 1992) in the region between markers ERG and 
D21S56 contains at least one gap or that the YAC 
clones available to our laboratory have accumulated 
deletions. 

As described above, discontinuous regions ot the 
TMPRSS2 cDNA were identical to portions of human 
PI clone 35-H5-C8, which was sequenced by Martin 
and co-workers (Martin et al, 1994; GenBank Acces- 
sion Nos. L35675-L35682). This PI also contained 
gene MX1, which maps to 21q22.3 in the interval be- 
tween ERG and D21S56 (Fig. 3). Therefore, this se- 
quence identity of TMPRSS2 with portions of PI 35- 
H5-C8 is in agreement with the mapping position ob- 
tained using the somatic cell hybrids. 

Northern Blot Analysis 

The insert of cDNA clone APG1 was used as a + probe 
against three filters containing 2 /xg of poly(A) + RNA 
from 16 human adult tissues and 4 human fetal tissues. 
A hybridization signal corresponding to an mRNA spe- 
cies of approximately 3.8 kb was detected (Fig. 4). The 



difference between the 2.4-kb cDNA clone APG1 and 
the 3.8-kb RNA species detected in the Northern blot 
is probably due to the continuation of the 3 'UTR down- 
stream of the end of clone APG1. 3 '-RACE from intes- 
tine RNA using oligonucleotides from clone APG1 (oli- 
gonucleotide primers AP26G, see above, and AP26K 5'- 
GTCTGGCTTTGGCACTCTCTGC-3 ') revealed a PCR 
product of approximately 2.0 kb, which corresponds to 
a mRNA length of 3.8 kb, compatible with the results 
of the Northern blot analyses (data not shown). The 
highest level of expression was observed in small intes- 
tine, but this gene is also expressed in human adult 
heart, placenta, lung, thymus, and prostate and in fetal 
brain and liver. Another weakly hybridizing mRNA 
species of 2.0 kb was also observed in several tissues. 
This could be due to alternative splicing, utilization of 
different transcription start sites and polyadenylation 
signals, overlapping transcripts, or, most likely, cross- 
hybridizing transcripts with sequence homologies with 
TMPRSS2. A human actin probe was used to control 
the amount of RNA loaded (data not shown). The ex- 
pression of the TMPRSS2 gene appears to be develo- 
pentally regulated since there is strong expression in 
fetal brain but very little expression in adult brain. In 
addition, in the lung, expression is high in the adult 
tissue but low in the fetal tissue. 

Type II Transmembrane Protein 

Protein prediction programs, which predict trans- 
membrane domains, including http://ulrec3.unil.ch/ 
software/TMPRED form.html (Hofmann and Stoffel, 
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FIG. 5. Schematic representation of the different domains of TMPRSS2. Numbers correspond to codons of the full-length cDNA shown 
in Fig. 1. For description of the domains see text. 
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1993), suggested that amino acids 84- 106 of TMPRSS2 
'were hydrophobic and likely to be a transmembrane 
domain (Figs. 1 and 5). This hydrophobic sequence is 

I i not preceded by a recognizable leader sequence. These 
! findings are compatible with a type II integral mem- 

I I brane protein in which the amino-terminus is at the 
t 1 cytoplasmic side of the membrane (Parks and Lamb, 
* | 1993). These features (a type II integral membrane 

i polypeptide with an extracellular protease domain) are 
\ I similar to those of mammalian hepsins (Leytus et al, 
, 1 1988; Tsuji et al, 1991). This latter protein is important 
! 1 for cell growth and maintenance of normal cell mor- 

j phology (Rurachi et al, 1994); however, the underlying 
mechanisms for the biological activities are unknown. 

| LDLRA Domain 

In addition to the transmembrane domain, TMP- 
1 RSS2 contains a protein motif of the so-called LDLRA 
(low-density lipoprotein receptor A) domain extending 
! from Cysll3 to Cysl48 (Figs. 1 and 5). This structural 
motif (PDOC00929; http://www.expasy.ch/cgi-bin/get- 
prodoc-entry?PDOC00929) was found in the low-den- 
j sity lipoprotein receptor gene, which contains seven 
successive such domains (Sudhof et al, 1985). A typical 
LDLRA domain is about 40 amino acids long and con- 
tains 6 disulfide-bound cysteines (cysteines 113, 120, 
126, 133, 139, and 148 in TMPRSS2). Similar domains 
\ have been found in both extracellular and membrane 
I proteins, including the VLDL receptor; gp330; Dro- 
sophila putative vitellogenin receptor; human entero- 
kinase complement factor I; complement components 
C6, C7, C8, and C9; perlecan; PKD1; and vertebrate 
: integral membrane protein DGCR2/IDD (Daly et al, 
1995). The amino acid comparison of the single LDLRA 
domain of TMPRSS2 with other similar domains is 
shown in Fig. 6a. The predicted 3D structure of this 
domain and its comparison with the first such domain 
of the LDLR is shown in Fig. 7a. The LDLRA domains 
form the binding site for LDL and calcium; the acidic 
residues between the fourth and the sixth cysteines are 
important for high affinity-binding of positively 
charged sequences in LDLR ligands (van Driel et al, 
1987; Mahley, 1988). 

The SRCR Domain 
An SRCR domain (Resnick et al, 1994) was also iden- 
! tified in TMPRSS2 extending from Vall49 to Leu242. 
j SRCR domains are approximately 100 amino acids long 
! and rich in cysteine. The overall consensus sequence 
j derived from more than 40 such domains from different 
J proteins revealed a consensus sequence at 41 of 101 
residues (Resnick et al, 1994). Two groups of SRCR 
domains are recognized, group A and group B, differing 



in the number of conserved cysteines. The SRCR do- 
main of TMPRSS2 contains the pattern compatible 
with group A SRCR. The sequence homology to differ- 
ent examples of group A SRCR domains is shown in 
Fig. 6b. The SRCR domains were first found in type I 
macrophage scavenger receptor (Freeman et al, 1990) 
but subsequently in many other sequences (for a com- 
prehensive list, see Resnick et al, 1994). The SRCR 
domain is reminiscent of but different from immuno- 
globulin domains. Proteins with SRCR domains are ei- 
ther at the cell surface or secreted into plasma or other 
body fluids. Some proteins such as the WC1 antigen or 
M130 contain nine or more such domains while others 
such as the MSR (macrophage scavenger receptor type 
I) and the secreted CF1 (complement factor 1) or 
cyclophilin C contain only one domain. The biochemical 
functions of the SRCR domain have not been estab- 
lished with certainty; however, most of these domains 
are involved with binding to the cell surface of extracel- 
lular molecules. 

Protease Domain 

The most striking feature of the TMPRSS2 predicted 
polypeptide is its similarity with members of serine 
protease family of proteins. The serine protease domain 
extends from amino acid residue Arg255 to the car- 
boxyl-terminus of the predicted polypeptide. There is 
approximately 45-55% identity with several members 
of the serine protease family; the best similarities are 
with human hepsin (X07002), human enterokinase 
(P98073), and human kallikrein (P03952). The features 
of the protease domain of TMPRSS2 are compatible 
with the SI family of the SA clan of serine-type pepti- 
dases as characterized by Rawlings and Barrett (1994). 
The prototype of this family is chymotrypsin and the 
3D structure of some of its members has already been 
resolved. For a comprehensive list of the SI serine-type 
peptidases see SWISS-PROT (http://www.expasy.ch/ 
cgi-bin/lists?peptidas.txt). TMPRSS2 exhibits conser- 
vation of serine protease sequence motifs (Fig. 6c); in 
particular, the active site residues can be identified as 
His296, Asp345, and Ser441. TMPRSS2 is predicted to 
cleave after Lys or Arg residues since it contains 
Asp435 at the base of the specificity pocket (SI subsite) 
that binds to the substrate. The predicted 3D structure 
of the protease domain of TMPRSS2 is shown in Fig. 
7b. The protein model was built using the SWISS- 
MODEL server for automated comparative protein 
modeling (Peitsch, 1995, 1996) as described under Ma- 
terials and Methods. It is of interest that TMPRSS2 
is highly homologous to hepsin, another protease that 
contains a transmembrane domain and is thus a type 
II integral membrane protein with its protease domain 
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in the extracellular space (Kurachi et al, 1994; Leytus 
et al, 1988; Tsuji et al, 1991). TMPRSS2 contains nine 
conserved cysteine residues which by homology to other 
proteases most likely form the following intrasub- 
unit disulfide bonds Cys826-Cys842, Cys926-Cys993, 
Cys957-Cys972, and Cys983-Cysl011 and the inter- 
subunit disulfide bond involving Cys758-Cys912 which 
probably joins the catalytic protease subunit with the 
nonprotease part of the polypeptide. The protease do- 
main does not contain potential N-glycosylation sites 
while the remainder of the predicted polypeptide con- 
tains two such potential sites (N213, in the SRCR do- 
main, and N249). The amino-terminal He of the prote- 
ase domain is preceded by Arg in the context of a pep- 
tide sequence Arg-Ile-Val-Gly-Gly (RIVGG), which is 
typical for the proteolytic activator site of many serine 
protease zymogens (Rawlings and Barrett, 1994). The 
potential cleavage between Arg and He, which would 
be similar to the activation mechanism of other serine 
protease zymogens, would convert TMPRSS2 to an ac- 
tivated form consisting of a nonprotease and a protease 
catalytic subunit linked by a disulfide bond that most 
probably involves Cys758 and Cys912. 

DISCUSSION 

In this paper we describe the cloning, chromosomal 
mapping, and initial characterization of a novel gene 
that maps on human chromosome 21q22.3 and encodes 
a polypeptide with multiple recognizable domains, 
namely LDLRA, SRCR, and serine protease domains. 
In addition, the presence of a transmembrane domain 
and the absence of a signal peptide suggest that this is 
a type II integral membrane protein. More biochemical 
experiments are necessary to further characterize the 
cellular localization of this protein and its physiological 
function. The biochemical events for the activation of 
the probable serine protease activity are unknown but 
are likely to be similar to those described above. It is of 
interest that the predicted TMPRSS2 protein contains 
additional domains (LDLRA and SRCR) that are poten- 
tially involved in binding with extracellular molecules 
or the cell surface. The molecules that are cleaved by or 
that bind to TMPRSS2 are unknown. There are several 
tissues that are shown by Northern blot analysis to 
express the TMPRSS2 gene. The site of the strongest 
expression is the small intestine; however, other tis- 
sues including heart, lung, and liver also showed a sig- 
nificant amount of TMPRSS2 mRNA. The function of 
this protein in these tissues remains elusive. 

Are there any monogenic disorders associated with 
the TMPRSS2? Several monogenic phenotypes due to 
mutations in unknown genes have been mapped by 
linkage analysis to chromosome 21q22.3; these include 
APECED (Aaltonen et al, 1994; OMIM 240300), an 
autoimmune disorder, two forms of autosomal reces- 
sive deafness (Bonne-Tamir et al, 1996; Veske et al, 
1996; OMIM 601072); Knobloch syndrome (Sertie et al, 
1996; OMIM 267750); one locus for manic depressive 
illness (Smyth et al, 1997; OMIM 125480); and one 



locus for holoprosencephaly (Muenke et al, 1995. 
OMIM 236100). All of these phenotypes are mapped* 
more distal to TMPRSS2, and it is therefore unlikely 
that TMPRSS2 is a candidate gene for any of these " 
disorders. 

Many human disorders are due to deficiency of other 
serine proteases. For example, deficiencies of coagula- ■ 
tion factors such as Factor XII (OMIM 234000), Factor i 
X (OMIM 227600), Factor IX (OMIM 306900), and Fac- # 
tor VII (OMIM 227500) belong to these disorders. Addi- ^ 
tional examples of such disorders are enterokinase de- p 
ficiency (Hadorn et al, 1969; OMIM 226200), trypsino- % 
gen deficiency (Townes, 1965; OMIM 276000), and i 
hereditary pancreatitis due to mutations in the cationic 
trypsinogen gene (Whitcomb et al, 1996). The genera- 
tion of mice with targeted disruption of the mouse 
TMPRSS2 gene will enhance our understanding of the 
function of this gene and will provide candidate pheno- 
types for further investigation. 

Is the overexpression of three copies of the TMPRSS2 
involved in one of the phenotypes of Down syndrome? 
TMPRSS2 maps outside the so-called Down syndrome 
critical region (DSCR; between markers D21S17 and 
ETS2), triplication of which is associated with many 
phenotypes of Down syndrome (Delabar et al, 1993). 
However, the existence of a single DSCR has recently 
been challenged since rare patients with proximal tri- 
somy 21 not including the D21S17-ETS2 region dis- 
played some of the phenotypes of Down syndrome (Kor- 
enberg et al, 1994). In addition, a wider region from 
D21S17 to and including MX1 was associated with sev- 
eral phenotypes, including the heart defect and some 
dysmorphic features of the syndrome (Delabar et al, 
1993; Korenberge* al, 1994). Since the TMPRSS2 gene 
is within this interval it is formally a candidate for 
some phenotype(s) of Down syndrome. Transgenic mice 
that overexpress the murine extracellular protein uro- 
kinase-type plasminogen activator have been shown 
to exhibit abnormal phenotypes (learning disabilities) 
(Meiri et al, 1994). The study of transgenic mice that 
overexpress the murine homologue of the human TMP- 
RSS2 gene may contribute to the understanding of the 
potential involvement of this gene in the pathogenesis 
of Down syndrome. A mouse model with partial trisomy 
16 (which corresponds to a partial human trisomy 21 
from APP to MX1) has recently been made (Reeves et 
al, 1995). It would be of interest to know if the murine 
homologue of the TMPRSS2 gene is included in the 
triplicated part of mouse chromosome 16. 
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